feat: Add FFI_QueryPlanner to support foreign query planners across shared-library boundaries#22151
feat: Add FFI_QueryPlanner to support foreign query planners across shared-library boundaries#22151milenkovicm wants to merge 1 commit into
FFI_QueryPlanner to support foreign query planners across shared-library boundaries#22151Conversation
FFI_QueryPlanner to support foreign query planners across shared-library boundariesFFI_QueryPlanner to support foreign query planners across shared-library boundaries
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
ee61d0f to
8e23f2c
Compare
Co-authored-by: Tim Saucer <timsaucer@gmail.com>
8e23f2c to
d49d1bc
Compare
|
At the moment I believe this is more like a hack to enable ballista working with datafusion python, than a systematic solution. Perhaps it should be moved to datafusion python to support that single use case until datafusion makes decision do we use Thus, I'd like to suggest moving this functionality to datafusion python. Would appreciate your opinion @timsaucer, @alamb & @andygrove |
|
As I was digging in, I found what I mentioned before wanting to change around. The root problem that I think needs to be reworked is making it so that we can pass If we did this then I think the entire approach in this crate would be a lot cleaner. I'm discovering right now that we also have an issue with the new scalar subquery data structure, so I've got my agent working away at that right now. |
|
Yes, that’s the one I had trouble with last time I tried this. |
Sounds like a reasonable approach to me |
Which issue does this PR close?
This is follow up on #20249 which was co-authored with @timsaucer
Closes #. (pending)
Rationale for this change
DataFusion's FFI layer allows plugins and foreign libraries to extend query processing across shared-library boundaries. However, there was no way for a foreign library to participate in the physical planning pipeline —
QueryPlannerrequired a concrete &SessionState, which cannot cross an FFI boundary.PR rallies on
dyn Sessionwhich limits use of this functionality, asPhysicalPlanner, which in most cases would be called, expectsSessionStateand in FFI environments it would be unsafe to castdyn SessiontoSessionState. Proposed change would unblock Ballista integration with DF python, but apart from that, I'm not sure how useful it would be until we align other important interfaces to usedyn Session.What changes are included in this PR?
FFI_QueryPlanner— an FFI-saferepr-Cstruct that wraps a foreignQueryPlannerimplementation, serializing theLogicalPlanvia datafusion-proto across the boundary and deserializing the resultingExecutionPlan.QueryPlannerWeaktrait — a variant ofQueryPlannerthat accepts&dyn Sessioninstead of&SessionState, making it implementable by foreign code. It make more sense to create new interface rather than changingQueryPlanner. The change would introduce big backward incompatibility, and limiting use cases this functionality could be used, Also, we can argue that havingSessionStateas parameter makes sense.QueryPlannerwith anAnybound to allow downcasting (needed byDynamicForeignQueryPlanner).FFI_QueryPlannerviaQueryPlannerWeakand the deferred-injection pattern viaDynamicForeignQueryPlanner.The
DynamicForeignQueryPlannerhelper (in the integration tests) shows how to break the construction-time dependency cycle:SessionContextneeds aQueryPlannerat build time, but theFFIplanner needs the codec which needs the context. The placeholder is registered at build time and swapped for the real FFI planner once both sides exist.Are these changes tested?
Yes. Integration tests are added in
datafusion/ffi/tests/ffi_planner.rs:test_ffi_query_planner— verifies that a foreign planner loaded from a shared library can produce a valid ExecutionPlan from a LogicalPlan.test_ffi_dynamic_query_planner— verifies the deferred-injection pattern where the FFI planner is installed into an already-constructed SessionContext viaDynamicForeignQueryPlanner.Are there any user-facing changes?
Yes, one change to a public trait:
QueryPlannernow requires Any as a supertrait (enables downcasting). This is not required for this implementation but it might be needed for implementorsOpen Questions
SessionStateanddyn Sessionusage across internal interfaces & (physical) planners makes this proposal not really useful until we move all relevant traits todyn Session; question is do we move this to datafusion-python to enable ballista use cases?SessionContextis needed to create aQueryPlannerbutQueryPlanneris required to create aSessionContext(I hope I did not get that horrible wrong)QueryPlannerWeakis debatable, I cant come up with better name